An Integrated Data Analysis Approach to Preprocessing, Visualization and Clustering of Microarray Data

نویسندگان

  • Roberto Amato
  • Angelo Ciaramella
  • Antonino Staiano
چکیده

Microarray technologies represent a powerful tool in biological research, but in order to attain their full potentialities, it is crucial to develop techniques to effectively exploit the huge quantity of data produced. We propose an innovative tool specifically tailored to perform preprocessing, visualization and clustering on this type of data. The improvements with respect to more traditional techniques are:(a) Preprocessing: a noise estimation method is developed to provide a formal, fast and accurate method to filter out the noisiest genes. The remaining genes are then processed using a nonlinear PCA model which extract, from each microarray experiment, a smaller number of features; (b) Visualization: the preprocessed genes represent the input of a Probabilistic Principal Surfaces (PPS), a latent variable model which has been shown to be very effective for data mining purposes; (c) Clustering: the trained PPS is used as bases for a clustering algorithm, based on a Negentropy information, able to compute, in an automatic way, the number of natural clusters inherently present in the data. All these phases are managed by means of a graphical user interface which provides further tools for data mining activities and gives the possibility to dynamically interact with the data. The tool was tested on yeast gene microarray data in order to find the groupings of co-regulated genes.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Modification of the Fast Global K-means Using a Fuzzy Relation with Application in Microarray Data Analysis

Recognizing genes with distinctive expression levels can help in prevention, diagnosis and treatment of the diseases at the genomic level. In this paper, fast Global k-means (fast GKM) is developed for clustering the gene expression datasets. Fast GKM is a significant improvement of the k-means clustering method. It is an incremental clustering method which starts with one cluster. Iteratively ...

متن کامل

Clustering and visualization approaches for human cell cycle gene expression data analysis

In this work a comprehensive multi-step machine learning data mining and data visualization framework is introduced. The different steps of the approach are: preprocessing, clustering, and visualization. A preprocessing based on a Robust Principal Component Analysis Neural Network for feature extraction of unevenly sampled data is used. Then a Probabilistic Principal Surfaces approach combined ...

متن کامل

به کارگیری روش‌های خوشه‌بندی در ریزآرایه DNA

Background: Microarray DNA technology has paved the way for investigators to expressed thousands of genes in a short time. Analysis of this big amount of raw data includes normalization, clustering and classification. The present study surveys the application of clustering technique in microarray DNA analysis. Materials and methods: We analyzed data of Van’t Veer et al study dealing with BRCA1...

متن کامل

An integrated heuristic method based on piecewise regression and cluster analysis for fluctuation data (A case study on health-care: Psoriasis patients)

Trend forecasting and proper understanding of the future changes is necessary for planning in health-care area.One of the problems of analytic methods is determination of the number and location of the breakpoints, especially for fluctuation data. In this area, few researches are published when number and location of the nodes are not specified.In this paper, a clustering-based method is develo...

متن کامل

NEC for Gene Expression Analysis

Aim of this work is to apply a novel comprehensive machine learning tool for data mining to preprocessing and interpretation of gene expression data. Furthermore, some visualization facilities are provided. The data mining framework consists of two main parts: preprocessing and clustering-agglomerating phases. To the first phase belong a noise filtering procedure and a non-linear PCA Neural Net...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005